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CLOCK DISTRffiUTION CIRCUITS AND METHODS OF OPERATING SAME 

THAT USE MULTIPLE CLOCK CIRCUITS CONNECTED BY PHASE 
DETECTOR CIRCUITS TO GENERATE AND SYNCHRONIZE LOCAL CLOCK 

SIGNALS 

RELATED APPLICATION 
This application claims the benefit of U. S. Provisional Application No. 
60/221,709, filed July 31, 2000, the disclosure of which is hereby incorporated herein 
by reference, 

BACKGROUND OF THE INVENTION 
The present invention relates generally to the field of electronic clocks, and, 
more particularly, to distribution of an electronic clock in an electronic circuit, such as 
an integrated circuit. 

The clock distribution network of a microprocessor may use a significant 
firaction of the total chip power and may have a substantial impact on the overall 
performance of the microprocessor. For example, the 72-Watt, 600 MHz Alpha 
processor dissipates approximately 16 Watts in global clock distribution, and another 
23 Watts in generating local clocks. Thus, more than half of the Alpha processor's 
power is used in driving the clock network. Moreover, the uncertainty in a global 
clock signal may be approximately 10% of the clock period. This may translate into 
an approximately 1 0% reduction in maximum operating speed. 

Modem microprocessors may use a balanced tree to distribute the clock. 
Because the delays to all nodes may be nominally equal, a balanced tree may be 
expected to exhibit relatively low skew. At gigahertz clock speeds, however, an 
increasing fraction of skew and jitter may come from random variations in gate and 
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interconnect delay. Typically, a relatively large amount of jitter in a clock tree is 
introduced by buffers and inter-line coupling to the clock wires, and a relatively small 
amount of jitter may come from noise in the source oscillator. Therefore, 
conventional clock designs may focus on matching the delay along the various clock 
paths. As clock speed increases, however, the signal delay across a chip may become 
comparable to a clock cycle. Because the error in a global clock generally increases in 
conjunction with an increase in the total path delay, the global clock error may 
constitute a relatively large fraction of the global clock cycle. Accordingly, there 
exists a need for improved clock distribution circuits and methods of operating same. 



SUMMARY OF THE INVENTION 
P Embodiments of the present invention provide clock distribution circuits, 

^ systems, and methods of operating same that use multiple clock circuits that are 

M coimected by phase detector circuits to generate and synchronize local clock signals. 

3 "a 1 5 For example, in some embodiments, a clock distribution circuit comprises a first clock 
„?f circuit that is configured to generate a first clock signal in response to a first error 

s signal, and a second clock circuit that is configured to generate a second clock signal 

in response to the first error signal. A first phase detector circuit connects the first 
f"^ clock circuit to the second clock circuit, and is configured to generate the first error 

C3 20 signal in response to the first and the second clock signals. 

In other embodiments of the present invention, a third clock circuit is 

configured to generate a third clock signal in response to a second error signal, and a 

second phase detector circuit connects the first clock circuit to the third clock circuit. 

In addition, the second phase detector circuit generates the second error signal in 
25 response to the first and the third clock signals, and the first clock circuit is further 

configured to generate the first clock signal in response to the first and the second 

error signals. 

By using multiple clock circuits to generate local, synchronized clock signals, 
chip-length clock lines that may couple in jitter may be avoided. Moreover, skew may 
30 be limited to that resulting from asymmetries in one or more phase detector circuits 
instead of mismatches in physically separated buffers. Because the clock signal is 
regenerated at each clock circuit, high-frequency jitter may not accumulate with 
distance from the clock source. 
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In other embodiments of the present invention, the first clock circuit comprises 
a loop filter circuit, which is configured to generate a control signal at an output 
terminal thereof in response to the first and the second error signals, and an oscillator 
that is configured to generate the first clock signal in response to the control signal. 
5 In other embodiments of the present invention, the first clock circuit further 

comprises a summation circuit that is configured to add the first and the second error 
signals to generate a composite error signal. The loop filter circuit is further 
configured to generate the control signal in response to the composite error signal. 
In still other embodiments of the present invention, the loop filter circuit 
10 comprises a first amplifier circuit and a second amplifier circuit that are connected at 
the output terminal of the loop filter circuit and are both responsive to the composite 
p error signal. 

^~ In still other embodiments of the present invention, the first phase detector 

circuit comprises a first pulse generator circuit that is configured to generate a first 
15 pulse signal in response to the first clock signal, and a second pulse generator circuit 
that is configured to generate a second pulse signal in response to the second clock 
signal. The first phase detector circuit fiirther comprises an arbiter circuit that is 
configured to generate the first error signal in response to the first pulse signal and the 
second pulse signal. 

20 Although described above primarily with respect to apparatus/device aspects 

of the present invention, it should be understood that the present invention may also 
be embodied as systems and methods for distributing a clock signal. 



iy 
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BRIEF DESCRIPTION OF THE DRAWINGS 
25 Other features of the present invention will be more readily understood from 

the following detailed description of specific embodiments thereof when read in 
conjunction with the accompanying drawings, in which: 

FIG. 1 is a block diagram that illustrates clock distribution circuits in 
accordance with embodiments of the present invention; 
30 FIG. 2 is a block diagram that illustrates clock circuits and phase detector 

circuits in accordance with embodiments of the present invention; 

FIG. 3 is a circuit schematic that illustrates phase detector circuits in 
accordance with embodiments of the present invention; 
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FIG. 4 is a graph of output current versus input signal phase difference for 
phase detector circuits in accordance with embodiments of the present invention; 

FIG. 5 is a graph that illustrates clock signal convergence for clock 
distribution circuits in accordance v^iih embodiments of the present invention; 
5 FIG. 6 is a circuit schematic of loop filter circuits in accordance with 

embodiments of the present invention; 

FIG. 7 is as circuit schematic of oscillators in accordance with embodiments 
of the present invention; and 

FIG. 8 is an oscilloscope graph of clock signals generated by clock 
10 distribution circuits in accordance with embodiments of the present invention. 

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS 



While the invention is susceptible to various modifications and alternative 
%n forms, specific embodiments thereof are shown by way of example in the drawings 

15 and will herein be described in detail. It should be understood, however, that there is 

ly 

f y the invention is to cover all iriodifications, equivalents, and alternatives falling within 



no intent to limit the invention to the particular forms disclosed, but on the contrary. 



p the spirit and scope of the invention as defined by the claims. It will also be 

7i understood that when an element is referred to as being "connected" or "coupled" to 

20 another element, it can be directly connected or coupled to the other element or 
2 intervening elements may be present. In contrast, when an element is referred to as 

being "directly connected" or "directly coupled" to another element, there are no 
intervening elements present. Like reference numbers signify like elements 
throughout the description of the figures. 
25 Referring now to FIG. 1, a clock distribution circuit 12, in accordance with 

embodiments of the present invention, comprises an array of phase locked loop 
circuits (PLLs). More specifically, independent clock circuits 14 (e.g., 14a and 14b) 
may generate substantially synchronized clock signals at multiple nodes across an 
integrated circuit device 15 with each clock circuit distributing its clock signal to only 
30 a small section {e.g., a tile) of the device. Each of the phase detector circuits 16 {e.g., 
16a, 16b, 16c, and 16d) connects one of the clock circuits ;14 to another one of the 
clock circuits 14 and generates an error signal that is used to adjust the ft-equencies of 
the clock signals generated by the connected clock circuits. Although the clock 
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distribution circuit 12 is shown in FIG. 1 in a square configuration in which each 
clock circuit is connected to four other clock circuits through four separate phase 
detector circuits, it will be understood that the clock circuits 14 may be connected in 
other geometric arrangements in accordance with embodiments of the present 
5 invention. 

When configuring the clock circuits 14 and the phase detectors 16 in the clock 
distribution circuit 12, both small-signal and large-signal performance may be 
considered. As used herein, small-signal refers to the state in which the phase 
differences between the clock signals generated by the clock circuits 14 are relatively 
10 small, and the clock circuits 14 can converge to a lock state in which the clock signals 
are substantially in phase with one another. Conversely, large-signal refers to the state 
in which the phase difference between two or more clock circuits is relatively large, 
and the clock circuits 14 may be susceptible to a phenomenon called "mode lock" in 
l2 which the clock signals are not in phase with one another, but nevertheless have a net 

1 5 phase error of approximately zero. In general, small-signal noise performance may be 
enhanced by increasing the number of connections between the clock circuits 14 
through the phase detectors 16. With regard to large-signal performance, G. A. Pratt 
J'l and J. Nguyen have shovm in their paper entitled "Distributed synchronous clocking," 

iy IEEE Trans. Parallel and Distributed Systems, Mar. 1995, the disclosure of which is 

f □ 20 hereby incorporated herein by reference, that for a system in mode-lock, there must be 
a phase difference 0 between two clock circuits such that 0 > 27i/n, where n is the 
number of nodes in the largest minimal loop in the network. A minimal loop is 
defined as a loop that cannot be decomposed into multiple loops. A detailed 
mathematical treatment of both small-signal and large-signal performance of 
25 exemplary clock distribution circuits 12, in accordance with embodiments of the 

present invention, is provided in an article by the present inventors, V. Gutnik and A. 
Chandrakasan, entitled "Active GHz Clock Network Using Distributed PLLs," IEEE 
Journal of Solid-State Circuits, Nov. 2000, the disclosure of which is hereby 
incorporated herein by reference. 
30 An exemplary embodiment of a clock circuit, such as the clock circuit 14a of 

FIG. 1, in accordance with the present invention, is shown in more detail in FIG. 2. 
It will be understood, however, that other clock circuit embodiments may also be 
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used. The clock circuit 14a comprises a summation circuit 18, a loop filter circuit 22, 
and an oscillator 24 that are configured as shown. The summation circuit 18 is 
configured to generate a composite error signal by adding the error signals fi-om the 
four phase detector circuits 16a, 16b, 16c, and 16d. The loop filter circuit 22 
5 generates a control signal in response to the composite error signal, which is used by 
the oscillator 24 to adjust the fi-equency of the clock signal 0 output fi-om the oscillator 
24. The clock signal 0 that is output from the clock circuit 14a is fed back to the four 
phase detector circuits 16a, 16b, 16c, and 16d, which generate respective error signals 
based on the phase difference between the clock signal 0 and the clock signals 1, 2, 3, 

10 and 4 generated by neighboring clock circuits 14 in the clock distribution circuit 12. 

FIG. 3 illustrates a phase detector circuit, such as the phase detector circuit 16 
of FIG- 2, in accordance with embodiments of the present invention, that may provide 
sufficient nonlinearity, relatively high gain for small differences in input signal phase, 
and improved noise performance at high fi-equencies. It will be understood, however, 

1 5 that other phase detector circuit embodiments may also be used. The phase detector 
circuit 16 may also detect large fi-equency differences between signals. As shown in 
FIG. 3, the phase detector circuit 16 comprises a first pulse generator circuit 32 and a 
second pulse generator circuit 34 that are connected to an arbiter circuit 36. The first 
pulse generator circuit 32 comprises a logic circuit that is configured as shown and 

20 receives an input signal SI at an input terminal thereof and generates a first pulse 

signal in response thereto. Similarly, the second pulse generator circuit 34 comprises 
a logic circuit that is configured as shown and receives an input signal S2 at an input 
terminal thereof and generates a second pulse signal in response thereto. 

The ^fMOS-loaded arbiter circuit 36 comprises transistors M38, M42, M44, 

25 M46, M48, and M52, and inverters 154 and 156, which act as a nonlinear phase 

detector. Transistor M44 and the inverter 154 receive the first pulse signal generated 
by the first pulse generator circuit 32. Transistor M52 and the inverter 156 receive the 
second pulse signal generated by the second pulse generator circuit 34. When there is 
input phase difference between the signals SI and S2, the outputs at terminals Yl and 

30 Y2 are substantially balanced. As the phase difference between signals SI and S2 

increases from zero, one output will be asserted for the fiall duration of an input pulse, 
while the other output will be asserted for only the remainder of the input pulse 
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duration after the first input pulse ends, which is equal to the phase difference between 
signals SI and S2. Thus, the detector may provide relatively high gain near zero 
phase error, but the gain may approach zero as the input phase difference approaches 
the input pulse width as shown in FIG, 4. 

The pulse generators 32 and 34 shown in FIG. 3 may enable the arbiter circuit 
36 to provide frequency error feedback. That is, if one input signal is at a higher 
frequency than the other, then its output will be asserted for more input pulses than the 
other. Because the width of the pulses is independent of input frequency, the average 
output voltage corresponds to frequency. Unlike a conventional phase-frequency 
detector, however, the strength of the error signal falls to approximately zero as the 
frequency difference approaches zero. Because the gain is relatively high near zero 
phase error and approaches zero as the input phase difference approaches the input 
pulse width, mode-lock problems may be avoided and large signal phase-locking may 
be enhanced. FIG. 5 shows the large-signal and small-signal behavior of an array of 
clock circuits 14 as the clock signals generated by these clock circuits 14 are 
synchronized with a reference clock. A phase detector may consume a space on a chip 
of approximately 30|Lim x 30fim. 

As discussed hereinabove with respect to FIG. 2, each clock circuit 14 may 
comprise a loop filter circuit 22 that generates a control signal for an oscillator 24. 
Conventional loop filters may use a charge pump with an RC pole-zero pair and may 
place the capacitor and resistor off chip. To avoid the series resistor of a charge pump 
with passive RC compensation, a feed-forward compensation method may be used. 
As shown in FIG. 6, a loop filter circuit, such as the loop filter circuit 22 of FIG. 2, in 
accordance with embodiments of the present invention, comprises two differential 
amplifiers Al and A2. Amplifier Al comprises transistors M62, M64, M66, M68, 
and M72. Amplifier A2 comprises transistors M82, M84, M86, M88, M92, M94, 
M96, M98, and Ml 02. It will be understood, however, that other embodiments of 
loop filter circuits may also be used. Transistors M74, M76, and M78 are used for 
biasing the two amplifiers Al and A2. hiverters 1102 and 1104 are connected to the 
gate terminals of transistors M92 and M94, respectively. The differential output 
currents from the phase detector circuits 16 that are connected to the clock circuit 14 
are summed by the sxunmation circuit 18 and provided to nodes In+ and In-, which 
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drive both amplifiers Al and A2. Amplifier Al is a single stage differential pair so it 
may have a relatively low gain, but its bandwidth may be limited by gn/Cgs, where gm 
is the transconductance of the transistors. 

Amplifier A2 includes a high gain cascaded stage driving a common source 
PFET Ml 02. Transistor M98 is a large gate capacitor, which serves to set the 
dominant pole of the amplifier A2 such that the stability of the PLL circuit comprising 
the clock circuit 14 and one or more phase detector circuits 16 may be enhanced. 
Transistor M96 may be biased at relatively low current to boost gain and to provide a 
low time constant (e.g,, 12kHz) with a 15 |j.m xl5 ^m gate capacitor. The loop filter 
design and feed-forward compensation may allow the loop filter to fit in a space of 
15|am X 45nm. Each clock circuit 14, comprising a summation circuit 18, a loop filter 
circuit 22, and an oscillator 24 may consume a space on a chip of approximately 
45iiim X 45|j,m. 

One metric that may be used in the design of oscillator circuits for clock 
generation is jitter. Moreover, power supply noise may be a primary contributor to 
jitter. Accordingly, the oscillator 24 may be designed to reduce the effects of power 
supply noise. As shown in FIG. 7, an oscillator, in accordance with embodiments of 
the present invention, may use an NMOS-loaded differential ring oscillator as a 
voltage controlled oscillator (VCO) to reduce power supply noise. Transistors Ml 12, 
M114, M116, M118, and M122 comprise a differenfial inverter M108, which drives 
an inverter chain M132. Transistors M114 and M118 comprise a differential pair and 
the tail current is driven by transistor Ml 16. The control signal Vctrl, which is output 
fi"om the loop filter circuit 22, is received at the drain terminal of the transistor M128 
and the gate terminal of the transistor Ml 16. Transistors M112 and M122 act as the 
NMOS load. The NMOS load may allow fast oscillation and may shield the output 
signal from noise fi-om the power supply Vdd. The voltage Vbias is a low-pass 
version of Vdd generated by subthreshold leakage through the PFET Ml 24. Supply 
noise, which may be coupled in through the gate to drain capacitance (Cgd) of 
transistors Ml 12 and Ml 22 may be bypassed by transistor Ml 26. Advantageously, 
Vbias may have reduced noise at high frequencies. The oscillation fi-equency may be 
dependent on the supply voltage and Vbias through capacitor nonlinearity. The 
feedback of the PLL (i.e., a clock circuit 14 and one or more phase detector circuits 




8 



Attorney Docket 111^5347-205 

16) may compensate for slow frequency variations that may be caused by variations in 
the supply voltage. 

Experimental Results 

The following experimental results are provided as an example and shall not 
be construed as limiting the present invention. An experimental chip has been 
fabricated with a 4 x 4 array of nodes {i.e., clock circuits 14) and a phase detector 
circuit 16 between nearest neighbors. Counting one clock circuit 14 and two phase 
detector circuits 16, the area overhead is approximately 0.0038 mm^ per tile. A phase 
detector circuit 16 placed between one of the nodes and the chip clock input locks the 
clock distribution network to an external reference. The respective outputs of the 1 6 
oscillators 24 are divided by 64 and driven off chip. AtVDD = 3 V, the divided 
outputs achieve frequency lock at approximately 17 MHz - 21 MHz, corresponding to 
oscillator phase lock at approximately 1.1 GHz - 1.3 GHz. An oscilloscope plot of 
four locked output signals is shown in FIG. 8. Long-term jitter between neighboring 
tiles is less than approximately 30 picoseconds rms. Cycle-to-cycle jitter is less than 
approximately 10 picoseconds. The oscillators, amplifiers, and biasing circuitry draw 
approximately 130 mA at 3V. 

From the foregoing it can readily be seen that clock distribution circuits, in 
accordance with embodiments of the present invention, may provide a distributed 
clock network in which the clock signal is regenerated at each node or tile. As a 
result, chip-length clock lines that may couple in jitter may be avoided. Skew may be 
limited to that resulting from asymmetries in one or more phase detector circuits 
instead of mismatches in physically separated buffers. Furthermore, the performance 
of the clock distribution network may scale with improvements in device speed rather 
than the generally slower improvements in on-chip interconnect speed. 

Many variations and modifications can be made to the preferred embodiments 
without substantially departing from the principles of the present invention. All such 
variations and modifications are intended to be included herein within the scope of the 
present invention, as set forth in the following claims. 




